library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(reactable)
library(purrr)

TidyVerse EXTEND Assignment

Instructions

In this assignment, you’ll practice collaborating around a code project with GitHub. You could consider our collective work as building out a book of examples on how to use TidyVerse functions.

GitHub repository: https://github.com/acatlin/FALL2024TIDYVERSE

FiveThirtyEight.com datasets

Kaggle datasets

Your task here is to Extend an Existing Example. Using one of your classmate’s examples (as created above), extend his or her example with additional annotated code. (15 points)

You should clone the provided repository. Once you have code to submit, you should make a pull request on the shared repository. You should also update the README.md file with your example.

After you’ve extended your classmate’s vignette, please submit your GitHub handle name in the submission link provided below. This will let your instructor know that your work is ready to be peer-graded.

You should complete your submission on the schedule stated in the course syllabus.

Data Import

This step below I will be importing the world happiness dataset from my github account URL:

worldhappiness <- read.csv(file = "https://raw.githubusercontent.com/tenzinda97/TidyVerse/refs/heads/main/world-happiness-report.csv")

Data filter and maping

First I will filter the data for a specific year.

worldhappiness2020 <- worldhappiness %>% 
  filter( year == '2020')

I filter the data for year 2020, which mean I will looking at information equivalent that year only.

Calculating the Average

For this step I will calculate the average life expectancy at birth for the year 2020

mean(worldhappiness2020$Healthy.life.expectancy.at.birth, na.rm = TRUE)
## [1] 67.09957

Purrr map function Now I will be using the mapping function from the purrr package on world hapiness dataset using the year filter 2020, I will be looking at healthy life expectancy at birth.

worldhappiness2020$Healthy.life.expectancy.at.birth %>% map_dbl(mean)
##  [1] 69.30 69.20 74.20 73.60 69.70 65.30 72.40 55.10 64.20 68.40 66.80 67.20
## [13] 62.40 54.30 74.00 70.10 69.90 68.30 71.40 74.10 71.30 73.00 66.40 69.10
## [25] 62.30 66.70 69.00 59.50 72.10 74.20 64.10 72.80 58.00 72.80    NA 68.40
## [37] 73.00 60.90 66.60 61.40 72.50 73.70 74.00 50.70 75.20 67.20 65.80 61.30
## [49]    NA 64.70 59.50 67.40 68.50 72.20 67.00 68.90 66.40 62.70 68.90 66.50
## [61] 59.60 57.10 72.50 73.60 50.50 65.56 73.40 62.10 70.10 72.80 65.10 66.90
## [73] 69.00 69.50 71.70 57.30 74.20 75.00 72.80 74.70    NA 64.70 58.50 67.60
## [85] 67.50 67.60 56.50 65.20 67.50 72.70 68.10 69.20 66.90 56.30 56.80

For this step I am using the same map function and extended it to multiple columns.

worldhappiness %>% 
  select( "Healthy.life.expectancy.at.birth", "Freedom.to.make.life.choices" ) %>% 
  map(~mean(.,na.rm = TRUE))
## $Healthy.life.expectancy.at.birth
## [1] 63.35937
## 
## $Freedom.to.make.life.choices
## [1] 0.7425576

Exploring map function futher more

Below I will use the map function a bit more. I will split the original data frame by year, and run a linear model on each year. I then apply the summary function the results from each model and then again use the map function to obtain the r.squared value for each year.

worldhappiness %>%  
  split(.$year) %>% 
  map(~lm( `Healthy.life.expectancy.at.birth` ~`Log.GDP.per.capita`  , data = .) ) %>% 
  map(summary) %>% 
  map_df("r.squared") %>% 
  
  reactable()

Conclusion

From the purrr package in the tidyverse I use the map function to show how to manipulate vector. ### Extension:

library(tidyverse)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout

This code sparked my interest in many different dynamics regarding the people of the world and their happiness. I wanted to create visualizations to bring these ideas to life and understand how different dynamics may correlate and what conclusions can be drawn from such correlations.

I first wondered what was the different life expectancy’s of the people around the world and what was the ranges and the amount of countries in such ranges. To obtain such result I decided to utilize the TidyVerse package by using ‘GGPLOT’ to create a density plot of life expectancy in the most recent year of data we have which is 2020.

density_plot_2020 <- ggplot(worldhappiness2020, aes(x = Healthy.life.expectancy.at.birth)) +
geom_density(fill = "green", alpha = 0.5) + 
labs(title = "Density Plot of Life Expectancy in 2020",
  x = "Life Expectancy at Birth",
  y = "Density")

density_plot_2020
## Warning: Removed 3 rows containing non-finite outside the scale range
## (`stat_density()`).

Using visualization, I observed that most countries have a life expectancy at birth ranging from 65 to 72 years, with 68 being the most common. This insight highlights global trends in life expectancy.

To delve deeper, I explored the correlation between GDP per capita and Happiness scores. Initially, I focused on 2020 data but wanted to examine consistency across years. To achieve this, I created an interactive visualization using the plotly package, allowing year-by-year comparisons of these relationships.

Happiness_vs_GDP_Plot <- plot_ly(worldhappiness, x = ~Log.GDP.per.capita, y = ~Life.Ladder, color = ~as.factor(year), type = "scatter", mode = "markers") %>%
  layout(title = "Happiness Score vs. GDP per Capita (All Years)",
         xaxis = list(title = "Log GDP per Capita"),
         yaxis = list(title = "Happiness Score"),
         colorway = c("#636EFA", "#EF553B", "#00CC96", "#AB63FA", "#FFA15A", "#19D3F3", "#FF6692", "#B6E880", "#FF97FF", "#FECB52"),
         hovermode = "closest",
         updatemenus = list(
           list(
             buttons = list(
               list(method = "restyle",
                    args = list("visible", list(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE)),
                    label = "All"),
               list(method = "restyle",
                    args = list("visible", list(TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE)),
                    label = "2005"),
               list(method = "restyle",
                    args = list("visible", list(FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE)),
                    label = "2006"),
               list(method = "restyle",
                    args = list("visible", list(FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE)),
                    label = "2007"),
               list(method = "restyle",
                    args = list("visible", list(FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE)),
                    label = "2008"),
               list(method = "restyle",
                    args = list("visible", list(FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE)),
                    label = "2009"),
               list(method = "restyle",
                    args = list("visible", list(FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE)),
                    label = "2010"),
               list(method = "restyle",
                    args = list("visible", list(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE)),
                    label = "2011"),
               list(method = "restyle",
                    args = list("visible", list(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE)),
                    label = "2012"),
               list(method = "restyle",
                    args = list("visible", list(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE)),
                    label = "2013"),
               list(method = "restyle",
                    args = list("visible", list(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE)),
                    label = "2014")
             ),
             direction = "down",
             showactive = TRUE,
             x = 0.1,
             xanchor = "left",
             y = 1.1,
             yanchor = "top"
           )
         )
  )

Happiness_vs_GDP_Plot
## Warning: Ignoring 36 observations
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors

Conclusion

The visualization revealed a clear positive correlation between GDP per capita and Happiness scores, with scatter plots consistently skewed to the left, showing that higher GDP per capita is associated with greater happiness. Comparing data from 2010 and 2020 reinforced this trend. I used ggplot2 to create the initial plots and enhanced them with plotly for interactivity.